Continuous multilinguality with language vectors
نویسندگان
چکیده
Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other. In contrast, we propose using continuous vector representations of language. We show that these can be learned efficiently with a character-based neural language model, and used to improve inference about language varieties not seen during training. In experiments with 1303 Bible translations into 990 different languages, we empirically explore the capacity of multilingual language models, and also show that the language vectors capture genetic relationships between languages.
منابع مشابه
Multilinguality in a Text Generation System For Three Slavic Languages
This paper describes a multilingual text generation system in the domain of CAD/CAM software instructions for Bulgarian, Czech and Russian. Starting from a language-independent semantic representation, the system drafts natural, continuous text as typically found in software manuals. The core modules for strategic and tactical generation are implemented using the KPML platform for linguistic re...
متن کاملLanguage Independent Methodologies to Tackle Multilinguality
Until now, Natural Language Processing (NLP) research development has mainly been conducted for the English speaking community. However, the European Union with its 25 member-states already involves 22 different official languages. As a consequence, multilinguality is certainly the most important challenge of this century for the European NLP community. In this paper, we show how the Centre for...
متن کاملMultilinguality in Electronic Commerce - Research Issues
We outline how language technology could be used to support multilinguality in electronic commerce. The growth in international trade and the increasing use of EDI in trade transactions brings new technical challenges also to language technology. Nowadays, trade transactions frequently cross language borders. While ever larger number of users get connected to electronic trading processes, it is...
متن کاملAgainst multilinguality
1. Introduction An obvious assumption of the present workshop is that multilingual corpora are useful, and should be built and investigated. In the present paper, I would like to point out that this is far from straightforward and actually remains to be proved. In addition, and in a more constructive vein, I want to present some examples that show that the right encoding depends crucially on wh...
متن کاملLarge Vocabulary Continuous Speech Recognition ( LV - CSR
Multilinguality need not be textual only, but will take on spoken form, when information services are to extend beyond national boundaries, or across language groups. Database access by speech will need to handle multiple languages to service customers from different language groups within a country or travelers from abroad. Public service operators (emergency, police, department of transportat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017